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0 A neural network with expert system functionality. 



© A method is disclosed for performing a variety of 
expert system functions on any trained feedforward 
neural network. These functions include decision- 
making, explanation, computation of confidence 
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measures, and intelligent direction of information ac- 
quisition. Additionally, the method converts the 
knowledge implicit in such a network into a set of 
explicit if-then rules. 
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Field of the Invention 



This invention relates to neural networks and 
more particularly to a class of neural networks 
known as feedforward networks. 

Background of the Invention 

Neural networks are composed of a series of 
interconnected neuron-like processing elements 
(PEs). The strengths of the connections between 
PEs are represented by weights. Each PE stores a 
value known as a state, which is either specified by 
input data or computed from the PE*s inputs and 
weights, using its transfer function. Typically, the 
transfer function is applied to the PE's net-input, 
the weighted sum of its inputs. Collectively, states 
are' used to represent information in-the short term. 
Long-term information or (earning is represented by 
the weights. Neural networks learn from examples 
by modifying their weights. Once learning or train- 
ing is completed, these networks can perform a 
variety of computational tasks. 

Much of the use of neural networks is focused 
on feedforward networks. These networks have an 
architecture consisting of a layer of input nodes, a 
layer of output nodes, and optionally, some number 
of hidden layers in between. Input data is repre- 
sented using the states of the input layer. The 
network's response to that data is represented by 
the states of the output layer. The feedforward 
nature of these networks results from the fact that, 
during an iteration, the computations flow from, the 
input layer, through any hidden layers, to the out- 
put layer. This architecture allows the network to 
learn to map input states to output states approxi- 
mating the correct response to the input. For exam- 
ple, if the input states represent the symptoms 
presented by a medical patient, the network is able 
to produce output states representing an estimation 
of the correct diagnosis for those symptoms. 

One of the hindrances to wider acceptance of 
neural networks is the fact that they function largely 
as black boxes. It is often difficult to understand 
why a specific set of inputs produced a particular 
output. This difficulty is a result of the fact that the 
network's 'knowledge' is encoded in the weights 
associated with a complex web of interconnections. 
It is desirable to find an explanation method for 
neural networks that is a way to explain a particular 
output in terms of the network inputs. For example, 
if a network is used to make loan decisions, it 
would be desirable to explain those decisions in 
terms of the input data describing the applicant. An 
explanation of this kind is required for negative loan 
decisions. 

The black box problem is not as serious for 
two-layer feedforward networks, i.e.. those without 



hidden layers. In these networks, the relationship 
between the inputs and outputs is straightforward. 
The magnitude and direction of the relationship 
between the states of an input PE and an output 

5 PE are given by the weight of the connection 
between the two PEs. Because the relationship 
between each input and output is fixed, these sim- 
ple networks cannot capture variable relationships 
between input and outputs, such as non-monotonic 

70 relationships. Nor can they capture the interdepen- 
dencies among inputs. That is to say. they cannot 
implement mappings in which the effect of some 
input on some output is dependent on the values of 
other inputs. They can learn only linear mappings. 

75 i.e., mappings where each output is proportional to 
a weighted sum of the inputs. Thus these networks 
are restricted to learning only a limited subset of 
the relationships that exist in the real world. 

Networks with hidden layers are needed to 

20 learn nonlinear mappings, including non-monotonic 
relationships between inputs and outputs and inter- 
dependencies among inputs. Since there is no 
straightforward relationship between inputs and out- 
puts in these networks, explanation is a difficult 

25 problem. One attempt at explanation uses sensitiv- 
ity analysis. This technique involves changing the 
value of an input, iterating the network, and noting 
whether there is any meaningful change in the 
network's output. Using the medical domain as an 

30 example again, sensitivity analysis would involve 
changing one of the input symptoms and noting 
whether there is a change in the network's diagnos- 
tic output. 

Explanation is also an issue in the field of 

35 expert systems. These systems are often com- 
pared to neural networks because the two technol- 
ogies attempt to solve some of the same problems, 
namely: classification; prediction: and decision- 
making. Explanation is more straight-forward in ex- 

40 pert systems than in neural networks, because the 
'knowledge' in an expert system is more explicit in 
that it is contained in a set of if-then rules known as 
a rulebase. In addition^to explanation, expert sys- 
tems possess other desirable capabilities not found 

45 in neural networks of the prior art. These capabil- 
ities include the ability to determine when enough 
input information is present to make conclusions; to 
provide intelligent direction to the user's informa- 
tion acquisition; and to calculate confidence mea- 

50 sures to accompany decisions. 

Unfortunately, export systems lack the ability to 
learn from examples, which is the most appealing 
feature of neural networks. Although some attempts 
* have been made to imbue expert systems with a 

55 "learning capability, they still rely primarily on hand- 
crafted rules as their source of 'knowledge'. Thus, 
building an expert system to solve a particular 
problem requires finding a human expert in the 
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problem domain, translating his knowledge into if- 
then rules, then debugging the rulebase. 

Clearly, it is desirable to combine the learning 
ability of neural networks with the explanation and 
other capabilities of expert systems. One known 
attempt at such a combination involves an expert 
system which uses a neural network as its source 
of 'knowledge', and is thus able to take advantage 
of neural network learning. In addition to expert 
system capabilities such as explanation, this hybrid 
system includes an optional facility for converting 
the knowledge contained in the network into a set' 
of rules. This system is described in the literature 
(see, for example, U.S. Patent 4,730.259), 

Unfortunately, the .techniques used in this hy- 
brid system of the prior art have several important 
limitations. Most significantly, they are applicable 
only to perceptron networks, also known as linear 
discriminant networks. With the possible exception 
of the input layer, the PEs of these networks are 
limited to tertiary states, i.e.. there are only three 
possible state values, corresponding roughly to 
TRUE, FALSE, and unknown. The result is that 
perceptron networks cannot compute mappings as 
precisely as continuous-state networks, i.e., net- 
works whose states are not limited to a set of 
discrete values. Even more important than the loss 
of precision is the fact that perceptron networks 
cannot be trained with backpropagation learning, 
which is the most popular training method for net- 
works with hidden layers. 

The explanation techniques of the prior art hy- 
brid system also have limitations. For example,- 
explanations can be generated only for conclusions 
since there is no method for explaining why the 
system is leaning toward one output or another 
prior to a definitrve conclusion. Explanations take 
the .form of rules, with the conditions (inputs or 
hidden layer states) on the left-hand side of the 
rule serving as an explanation for the conclusion on 
the right-hand side. Those conditions included in 
the rule make a larger positive contribution to the 
conclusion than those omitted. However, no precise 
measurement of contributions is produced for ei- 
ther the included or omitted conditions. 

The prior art hybrid system is also limited in 
the way in which it directs the user's information 
acquisition. The system chooses a single input 
whose value is unknown but important and asks the 
user to give a value for that input. However, the 
system provides no information about the relative 
importance of the other unknown inputs. 

Summary of the Invention 



Accordingly, the present invention is a system 
which allows expert system functionality, including 
explanation, to be added to feedforwerd neural 



networks. The invention overcomes the deficiencies 
in the prior art. In particular, the present invention 
overcomes the deficiencies in the existing technol- 
ogy for continuous-state, feedforward . networks in 

5 that it can determine when enough input informa- 
tion is present to make conclusions; it can pre- 
cisely explain its decisions and why some of those 
decisions become conclusions; it can calculate 
confidence measures to accompany its decisions; 

70 and it can compute the relative importance of the 
inputs with unknown values. The present invention 
is different from existing expert system technology 
because it can learn from examples, rather than 
relying on humans to give it knowledge in the form 

75 of rules. Explanation in the present invention is 
different from the existing technology for explana- 
tion in two-layer networks because the invention 
handles nonlinear relationships. The present inven- 
tion is different from the hybrid system of the prior 

20 art because it overcomes the deficiencies in that 
system: for example, the invention operates on the 
powerful class of continuous-state feedforward net- 
works. 

25 Description of the Drawings 

Figure 1 is a diagram of the major components, 
including inputs and outputs, according to the 
present invention; 
30 Figure 2 is a flowchart of the operation of the 
inference module according to the present in- 
vention: 

Figures 3A and 3B are flowcharts of the opera- 
tion of the Conclusion Explanation according to 
35 the present invention: 

Figure 4 is a flowchart of the operation .of the 
.Decision Explanation according to the present 
invention; 

Figure 5 is a flowchart of the operation of the 
40 Intelligent Knowledge Acquisition module ac- 
cording to the present invention; 
Figure 6 is a diagram of the neural network 
architecture and data flow therethrough accord- 
ing to the present invention for operation on a 
45 loan-scoring problem; 

Figure 7 is a description of the thirteen input 
variables in the loan-scoring problem, along with 
sample values, in the operation illustrated in 
Figure 6; 

50 Figure 8 is an example of output for Decision 
Explanation in the loan-scoring problem illus- 
trated in Figures 6 and 7; 

Figure 9 is an example of output for Conclusion 
Explanation in the loan-scoring problem illus- 
55 trated in Figures 6 and 7; 

Figure 10 is an example of output for Intelligent 
Knowledge Acquisition in the loan-scoring prob- 
lem illustrated in Figures 6 and 7: and 
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Figure 11 is an example of output for Rule 
Generation In the loan-scoring problem illustrat- 
ed in Figures 6 and 7. 

Description of the Preferred Embodiment 

To understand the present invention, it is im- 
portant to understand the concept of input and 
output variables. A user of the present invention 
deals with these variables rather than dealing di- 
rectly with the input and output states which under- 
lie the variables. The decision-making process ac- 
cording to the present invention is viewed as map- 
ping input variables to one or more output vari- 
ables. Given values for some or all of the input 
variables, the present Invention determines the val- 
ues of the output variables. Those values cor- 
respond directly to the decisions made according 
to the present invention. For example, if there* was 
an output variable which represented a loan de- 
cision, a value of TRUE might correspond to a 
decision to approve the loan. 

The present invention has three types of input 
variables, namely: continuous: Boolean; and sym- 
bolic. A continuous input variable can have any 
numerical value and it is represented by a single 
PE whose state is equal to that value. If the value 
of the variable is unknown, the state is set to some 
default value, typically the mean value of the vari- 
able. A Boolean input variable can have a valve of 
either TRUE or FALSE. It is represented by a 
single PE with a state drawn from the set {-B.M.B}. 
corresponding to {FALSE: UNKNOWN. TRUE}. B 
■is typically set at 1.0 and M is typically set to 0.0. 
A symbolic input variable draws its value from .a 
finite set of symbols. An example is the variable 
Vegion'. which can have any value in the set 
{Northeast. Southeast. Northwest. Southwest}. 
Symbolic variables are represented using multiple 
PEs and a 1-out-of-n code. A symbolic variable 
with n possible values is represented by n PEs. 
each of which corresponds to a different value. If 
the value of the variable is known, the PE cor- 
responding to the current value is given a state of 
B and the other PEs are given states of -B. If the 
value of the symbolic variable is unknown, all the 
PEs have states of M. As with Boolean variables. B 
Is typically set to 1.0 and M is typically set to 0.0. 

Output variables can be Boolean or symbolic. 
These types of variables are appropriate for the 
decision output of the present invention ■ because 
they have categorical values, and decisions are 
inherently categorical. During operation in training 
mode, the network is presented with Boolean and 
symbolic training values. I.e.. values corresponding 
to the correct decisions for the output variables. 
These training values are represented the same as 
Boolean or symbolic input values, using states in 



the set {-B.M,B}. However, the states produced in 
the output layer of a continuous-state feedforward 
network are, by definition, continuous and thus not 
restricted to {-B,M.B}. Therefore, the output states 
5 are interpreted as being approximations of Boolean 
and symbolic values. 

When interpreting output states, a symbolic 
output variable is given the value corresponding to 
the underlying PE with the highest state. That PE is 
70 said to be the selected PE. A Boolean output 
variable is given a value according to a user- 
chosen Boolean decision threshold. If the under- 
lying output state is greater than the threshold, the 
value of the variable is TRUE. Otherwise, the value 
75 . of the variable is FALSE. 

• Referring now to the diagram of Figure 1. there 
are shown the essential elements of the present 
invention. These elements are referred to as com-, 
ponents of the invention and they are divided into 
20 three categories: input, output, and internal compo- 
nents. One of the input components is the set of 
weights 9 that results from training the network with 
examples from the chosen problem domain. The 
weights 9 encode the 'knowledge' of the network 
25 about the domain. The method of training is in- 
dependent of the present invention, so a user can 
choose from among the learning algorithms in- 
tended for feedforward networks. Backpropagation 
is the most widely used learning algorithm for the 
30 powerful class of continuous-state feedforward net- 
works with hidden layers, on which the present 
invention can operate. 

Another major Input component is the input 
data 1 1 . which specifies the values of the input 
35 variables for each case that is to be analyzed by 
the invention. The value of any input variable can 
be specified as UNKNOWN, for a particular case; 
since the present invention in designed to intel- 
ligently handle unknown values. 
40 Still another input component is the input vari- 

able statistics 13 which are applicable only if some 
of the input variables are continuous. These statis- 
tics describe the distribution of values for each 
continuous input variable. Specifically, the system 
45 requires either the mean and standard deviation for 
each such variable, or the minimum and maximum 
for each variable. In the latter case, the means are 
also needed if (1) they are used as the .default 
state when the value of a continuous input variable 
50 is unknown: or (2) the explanation module 23 will 
be used for Rule Generation 31. The required 
distribution statistics can be computed from a data 
set. such as the set of training examples or the set 
of cases to be analyzed. Alternatively, the statistics 
55 can be estimated. 

The diagram of Figure 1 illustrates that the 
present invention has three major internal compo- 
nents. The most fundamental of these components 
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is the inference Module 15 which is responsible for 
three output components; nannely: decision 17; 
conclusions 19; and confidence measures 21 
(Decisiveness and Certainty). The operation of the 
Inference module is illustrated in the flowchart of 
Figure 2 with regard to a single output .variable. 
The module and figure are discussed below. 

The Inference module is operated whenever 
the values of input variables are added, deleted, or 
modified, as recorded in block 41 of Figure 2. 
Based on the new input values, the output states of 
the network are updated in block 43. Additionally, 
the module updates high and low bounds on those 
output states in block 45. These bounds are upper 
and lower limits on the possible range of each 
output state, given the known input values and any 
combination of values for some or all of the un- 
known input variables. Computing these bounds 
requires that assumptions be made about the pos- 
sible values of continuous input variables. Details of 
these assumptions and the precise method for 
computing the bounds are explained following this 
overview of the Inference module. 

The updated output states are used to deter- 
mine the values of the output variables.. Those 
values are the decisions 17 of the present inven- 
tion. As previously explained, the method of deter- 
mining output variable values or decisions from 
output states depends upon whether an output 
variable is Boolean or symbolic. If it is Boolean, 
then the underlying output state is compared to the 
user-chosen Boolean decision threshold in block 
49. If the state is greater than the threshold, the 
decision is TRUE in block 51. Otherwise, the de- 
cision IS FALSE in block 53. If an output variable is 
symbolic, the underlying PE with the highest output 
state is selected in block 59. In block 61, the 
decision is the symbol corresponding to the se- 
lected PE. 

After updating the decision, the present inven- 
tion determines whether it can declare the decision 
to be a conclusion 19. A conclusion can be 
reached when enough of the input values are 
known so .that no combination of possible values 
for the unknown input variables could change the 
value of the output variable corresponding to the 
decision. For symbolic output variables, this con- 
clusion condition is checked for in block 63. If the 
low bound on the state of the selected PE is 
greater than the high bound on the state of each 
unselected PE. then that selected PE is guaranteed 
to have the highest state given any combination of 
values for the unknown inputs. The conclusion con- 
dition IS thus met. and the decision is declared to 
be a conclusion in block 67. Otherwise, the de- 
cision is said to be tentative in block 65. 

For Boolean output variables, the method for 
checking the conclusion condition depends on the 



decision. If the decision is TRUE, the conclusion 
condition is checked in block. 55. The condition is 
met when the low bound on the state of the under- 
lying output PE is greater than the Boolean de- 

5 cision threshold, ensuring that the PE's state will 
exceed the threshold given any combination of 
values for the unknown inputs. If the decision is 
FALSE, the conclusion condition is checked in 
block 57. The condition is met when the high 

70 bound on the underlying output state is less than 
the decision threshold, ensuring that the state will 
be less than the threshold. 

If all input values are known, the conclusion 
condition is guaranteed to be met. However, in 

75 real-world domains, it is often the case that some 
input values are unknown. In medical diagnosis, for 
example, important data may be missing because 
it is costly, time-consuming, or even risky to obtain. 
The conclusion-generating capability of the present 

20 invention overcomes the problem of missing data. 
The invention allows a user to know when he has ^ 
enough input data to be sure of the computed 
decision. 

The state bounds that are used to make con- 
25 elusions are also used to compute the Certainty 
■ confidence measure 21 for a decision. Certainty is 
a measure of confidence that a conclusion for an 
output variable would be the same as the current 
decision for that variable. Certainty ranges from 0% 
30 to 100%. reaching the maximum value in block 71 
only when a conclusion has been reached. Only 
then it is guaranteed that additional inputs won't 
change the decision. Before a conclusion is 
reached. Certainty must be computed in block 69. 
35 . It is computed based on .the distance between the 
output bounds required for the decision to become 
a conclusion and the actual output bounds. 

For Boolean output variables, the Certainty per- 
centage is computed as: 

40 . 

(S(o.lo)-LriOO (T-L) for. TRUE decisions 
(U-S(o,hi))*100 (U-T) for FALSE decisions 

where S(o, hi) and S(o.^ lo) are the high and low 
45 bounds on the state of the underlying output PE o. 

U and L are the upper and lower limits respectively 

on the output of the transfer function of o. and T is 

the Boolean decision threshold. 

For symbolic output variables, the Certainty 
50 percentage is computed as: 

[{(S(o.lo)-S(o'.hi)) (U-L)} + iriOO 

where o is. the selected output PE and o' is the 
55 unselected PE with the highest high bound. 

A second confidence measure 21, Decisive- 
ness, measures the strength of a decision and is 
computed in block 73. The maximum possible de- 
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cislon strength is indicated by a Decisiveness value 
of 100%. A value near 0% indicates a weak de- 
cision. Decisiveness is computed based on the 
distance between output states and decision 
boundaries. Specifically, for Boolean output vari- 
ables, the Decisiveness percentage equals: 

(S<o)-T)"100 (U-T) for TRUE decisions 
{T-S(o))'l00 (T-L) for FALSE decisions 

where S(o) is the state of the underlying output PE 

o. 

For symbolic output variables, the Decisive- 
ness percentage equals: 

(S(o)-S(o')n00'(U-L) 

where o is the selected output PE and oVis the 
unselected PE with the highest state. 

The medical domain provides an example of 
the usefulness of the confidence measures de- 
scribed above. If the present invention were used 
to make diagnostic decisions, those decisions 
could have life-and-death consequences. Before 
trusting a particular decision or acting upon it. a 
physician would want to know how much con- 
'fidence he should have in that decision. Decisive- 
ness will tell him how clearly the diagnostic de- 
cision is favored over alternative diagnoses. Cer- 
tainty will tell him how likely it is that the decision 
would remain the same were he to gather addi- 
tional information. 

Note that the two confidence measures are 
only indirectly related. It is possible to have a high 
Certainty value but a low Decisiveness value or 
vise versa. For example, the fornner case occurs, 
when a weak decision becomes a conclusion. 

Turning now to the precise method for comput- 
ing the bounds on output states, the method re- 
quires that assumptions be made about the possi- 
ble values of continuous input variables. These 
assumptions are based on the input variable dis- 
tribution statistics 13. If the user. chooses to use the 
minimum and maximum statistics, the present in- 
vention assumes that the possible values of an 
unknown continuous input variable fall between its 
minimum and maximum. If the user chooses to use 
the mean and standard deviation instead, he must 
also specify a positive range value r. which applies 
to all continuous input variables. The invention as- 
sumes that the possible values of an unknown 
• continuous variable i fall in the range defined by 
the following limits: 

lower limit = Mean(i) - (r'Sd(i)) 
upper limit = Mean(i) + (r'Sd(i)) 

whore Mean(i) and Sd{i) are the mean value and 



standard deviation of variable 1. 

The bounds on the states of output PEs are 
computed using the following recursive equations 
for the state bounds of any non-input PE. If p is a 
5 non-input PE. its state bounds are computed as; 

S(p.lo) = T(€sC(s.pJo)) 
S(p,hi) = T(isC(s,p.hi)) 

70 where T is the transfer function of p and C(s.p.hi) 
and C(s.p,lo) are the high and low bounds on the 
contributions made to the net-input of p by each of 
its sources s. Each s is either a hidden PE, input 
variable, or bias (special input PE whose state is 

75 always equal to 1 .0). 

Which of the three s is determines how the 
contribution bounds .are computed. If s is a bias, 
then: 

20 C(s.p.loj.= C(s.p.hi) = W(s,p) 

where W(s.p) denotes the weight of the connection 
from s to p. 

If s is a hidden PE. then the following equa- 
tions are used: 

If W(s,p)>0 then 

C(s.p,lo) = W(s.prS(sJo) 
C(s,p.hi) = W(s.prS(s,hi) 

30 

Otherwise 

C(s.p.lo) = W(s.prS(s.hi) 
C(s.p.hi) = W(s,prS(s.lo) 

35 ' 

. If s is an input variable, then the calculation of 
contribution bounds depends on the data type of s 
and whether s is known or unknown. If s is Boolean 
or continuous and known to have a value v, then: 

JO 

C(s,p,lo) = C(s.p.hi) = v-W(i,p) 

where i is the PE underlying s. 

If s is Boolean or continuous and its value is 
45 unknown, then the following equations are used: 
If W(i,p)>0 then 

C(s.p.lo) = W(i.prMin(s) 
C(s.p.hi) = W(i.prMax(s) 

50 

Otherwise 

C(s.p.lo) = W(i.p)*Max(s) 
C(S.p.hi) = W(i,prMin(s) 

55 

where Max(s) and Min(s) are the maximum and 
minimum possible values of s. If s is Boolean, Max- 
(s) is B and Min(a) is -B. where B and -B are the 
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values used to represent TRUE and FALSE respec- 
tively. If s is continuous. Max(s) and Min(s) are the 
upper and lower assumed limits on s, as derived 
from the distribution statistics. 

If s is symbolic and known to have a value 
corresponding to PE v in the underlying choice 
group, then: 

C(s,p.lo) = C(s.p.hi) = C(s.p,v) 

where C(s.p.v) is the contribution s makes to the 
net-input of p when the value of s corresponds to v. 
Specifically, 

C(s,p.v) = (W(v,p) - fK>vW(i,p)rB 

where i iterates over the PEs in the underlying 
choice groL.::. 

If s is symbolic and its value is unknown, then: 

C(s.p.lo) = Mln(C(s.p,i)) 
C(s,p.hi) = Max(C(s.p.i)) 

where Max and Min are computed over all PEs i in 
the underlying choice group. 

An example problem is now used to dem- 
onstrate the operation of the Inference module de- 
scribed directly above. The example involves mak- 
ing decisions on personal loan applications, based 
on individual applicant data. The neural network 
architecture and data flow used for this problem ^ 
are illustrated in Figure 6. The network is simulated " 
using computer software, although the present in- 
vention is equally applicable to a network Imple- 
mented in hardware. The input layer 203 of the 
network contains thirteen PEs. one for each of 
thirteen continuous input variables. The input layer 
is fully connected to a single hidden layer 205 
containing three PEs. which in turn is fully con- 
nected to an output layer 207 consisting of a single 
PE. The output PE represents a Boolean variable 
whose value corresponds to the loan decision 209. 
A TRUE value corresponds to a decision to ap- 
prove the loan, and a FALSE value corresponds to 
a decision to decline. The output and hidden PEs 
have a conventional sigmoid transfer function, with 
output in the range [-1.0,1.0]. 

Note that the network in Figure 6 is just one 
example of the type of neural network on which the 
present invention can operate. In general, the in- 
vention can operate on a feedforward network with 
any number of hidden layers (including zero), any 
number of PEs in the hidden, input, and output 
layers, and an arbitrary transfer function for a PE 
(different PEs can have different transfer functions). 

The particular network is trained using back- 
propagation learning, which modifies the weights 
211. The backpropagation learning algorithm, which 



is independent of the present invention, involves a 
backward data flow in addition to the feedforward 
flow depicted in Figure 6. The network of Figure 6 
is trained on data 201 which represents examples 
5 of previous applicants whose loan outcomes are 
known. Each example is assigned a training value 
of either Approve (1.0) or Decline (-1.0), depending 
on whether the applicant defaulted on the loan. The 
applicant is described by the values of the thirteen 
70 input variables, which are explained in the table of 
Figure 7. The table also contains the input values 
for an actual applicant in the test database. 

Using the network in Figure 6, the present 
invention is able to make a conclusion about this 
75 sample applicant, despite two missing input values 
in the data described in Figure 7. The concluded 
decision is to approve the loan. This decision cor- 
responds to a TRUE value for the output variable 
and is a correct decision in that the applicant 
20 satisfactorily repaid the actual loan. Since a conclu- 
sion was reached. Certainty is 100%. Decisiveness 
is computed to be 13.4%, using the formula de- 
scribed above for Decisiveness given a TRUE de- 
cision (Decisiveness was usually below 40% for 
25 examples in the test database). In order to further 
demonstrate, assume that only those values in Fig- 
ure 7 which came from a credit report (i.e., values 
for Active Accounts, New Accounts. # of Inquiries, 
Public . Record Items, Tirne in Credit File, and the 
30 three Overdue variables) are known. Now there are 
. too many unknown inputs to allow a conclusion, so 
the present invention produces a tentative decision 
instead. The decision is to approve, with Decisive-, 
ness of 10.1% and Certainty or 47.4%. These 
35 values are computed using the Decisiveness and 
Certainty formulas for TRUE decisions. 

Referring again to Figure 1 . the second internal 
component is the Explanation module 23. This 
module produces two types of explanation. One 
40 type explains any decision, whether tentative or 
conclusive, by computing the contribution each 
known input value makes to that decision. This 
type is Decision Explanation 25 and is based on 
output states. The other type. Conclusion Explana- 
45 tion 27. examines output bounds to explain how a 
conclusion was reached. Conclusion Explanation 
computes the contributions made by known inputs 
toward reaching the conclusion condition, and also 
determines a minimal subset of those inputs that is 
50 sufficient to support the conclusion. 

Note that both types of contributions are com- 
puted in the context of the current set of input 
values. Thus the contribution made by a given 
input variable to a given decision or conclusion is 
55 dependent not just on the variable's own value, but 
also on the values (or lack thereof) of the other 
input variables. Thus, both types of explanation 
capture the interdependency among inputs that is 
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found in feedforward networks with hidden layers. 

Conclusion Explanation 27 computes a con- 
tribution for each known input variable that is equal 
to the decrease in the strength of the conclusion 
condition when the value of the variable is as- 
sumed to be unknown. That strength is measured 
as the distance between the bounds representing 
the conclusion condition and the actual bounds on 
the states underlying the output variable for which 
the conclusion has been reached. The operation of 
Conclusion Explanation is illustrated in the 
flowchart of Figure 3A with regard to a single 
output variable- The module and figure are dis- 
cussed below. 

' In block 81 of Figure 3A, the present invention 
checks to see if a conclusion has been reached for 
the output variable for which explanation is desired. 
If no conclusion has been reached. Conclusion 
Explanation is not possible in block 83. Otherwise. 
Conclusion Explanation proceeds in block 85 by 
examining the first or next input variable whose 
value is known. The value of that variable is tem- 
porarily assumed to be unknown, and the bounds 
on the output . states are temporarily adjusted in 
block 87 to reflect that assumption. The adjusted 
and true (before adjustment) output bounds are 
compared to determine the contribution of the input 
variable being examined. 

The method of computing the contribution is 
dependent on whether the output variable is Bool- 
ean or symbolic in block 89. If the output variable 
is Boolean, the contribution is further dependent on . 
whether the. decision for that variable is TRUE or 
FALSE in block 91. If the decision is TRUE, the 
conclusion condition is based on the low bound on 
the state of the PE underlying the Boolean output 
variable, and thus the contribution in block 93 is 
equal to the true low bound minus the adjusted low 
bound on the underlying state. If the decision is 
FALSE, the conclusion condition is based on the 
high bound, and thus the contribution in block 95 is 
the adjusted high bound minus the true high bound 
on the underlying state. Formally, the contributions 
equal: 

S(o.lo)-S' (o.lo) for TRUE decisions 
S'(o.hi)-S(o,hi) for FALSE decisions 

where o is the underlying output PE. S signifies the 
true bounds, and S' signifies the adjusted bounds. 

If the output variable is symbolic, the conclu- 
sion condition is based on the gap between the low 
bound on the state of the selected PE and the 
highest high bound among the unselected PEs. 
-Thus, for symbolic output variables, the contribu- 
tion in block 97 is equal to the gap given the true 
bounds minus the gap given the adjusted bounds. 
Formally, the contribution equals: 



(S(o.lo)-S(o\hi)HS*(o.lo>-SXo".hi)) 

where o is the selected output PE. o' is the un- 
5 selected PE with the highest true high bound, and 
o" is the unselected PE with the highest adjusted 
high bound. 

Note that all contributions computed by Con- 
clusion Explanation will be non-negative. This re- 
70 suits from the combination of these facts: 1 the 
contribution of an input variable is computed rela-. 
tive to that variable being unknown; 2. when the 
value of an input variable becomes known, the 
bounds on each output state either stay the same 
75 or become narrower, where narrower bounds mean 
that the high bound has decreased and or the low 
bound has increased; 3. Narrower output bounds 
can strengthen, but not weaken, the conclusion 
condition; and 4. the contributions measure the 
20 . strengthening of the conclusion condition. 

. After computing the contribution for a particular 
input variable, the present invention checks in 
block 99 to see whether all known input variables 
have been examined. If not. the next known vari- 
25 able is examined. Otherwise, Conclusion Explana- 
tion proceeds to block 101, where the contributions 
for all known input variables are multiplied by a 
scaling constant, determined such that the largest 
scaled contribution is 100.0. The constant is cal- 
30 culated by dividing lOO.O by the maximum of the 
. unsealed contributions. In block 103, the scaled 
contributions, along with the corresponding variable 
names and values, are produced as output. The 
output is ordered numerically by contribution. 
35 Conclusion Explanation further explains a con- 

clusion by finding the minimal premises, a subset 
of the known input values that, by themselves, are 
sufficient for reaching the conclusion. Known inputs 
not in the minimal premises could be unknown and" 
40 the conclusion would still hold. The conclusion and 
minimal premises can be viewed together as a rule 
which states a set of sufficient conditions for reach- 
ing the conclusion, independent of the case cur- 
rently being analyzed. 
45 The operation -of Conclusion Explanation in 

. finding the minimal premises is illustrated in Figure 
3B. The first step is to set k to the number of 
known input variables and to initialize the count c 
to zero in block 111. Each time blocK 113 is 
50 reached, c is incremented by 1 . The first time 
through block 115, the present invention chooses 
the known input variable with the smallest contribu- 
tion. It assumes the value of that variable to be 
unknown in block 117. In block 119. it checks to 
55 see if the conclusion condition, as determined by 
the Inference module, still holds in light of that 
assumption. If the conclusion is still supported, the 
known input variable with the second smallest con- 



8 



15 



EP 0 468 229 A2 



16 



tribution is chosen in a return to block 115. The 
value of this variable is also assumed to be un- 
known. This process repeats, with c counting the 
number of known input variables assumed to be 
unknown. When the conclusion no longer holds in 
block 119. the minimal premises are determined in 
block 121. The minimal premises are determined 
to consist of the k-c + 1 inputs with the largest 
contributions. 

Figure 9 contains an* example of the output of 
Conclusion Explanation, using the sample loan ap- 
plicant described in Figure 7 and the network de- 
scribed in Figure 6. For each known input variable, 
the output lists the contribution and indicates 
whether the variable belongs to the minimal prem- 
ises. The contributions were computed using the 
formula described above for contributions given a 
TRUE decision; The minimal premises are distin- 
guished in the output with the heading Sufficient. 
Known input variables that are not among the mini- 
mal premises are listed under Additional. Figure 9 
shows that the minimal premises, in this case, 
include all known input variables except for Time in 
Credit File. It also shows that the values of 30-Days 
Overdue Now and # of Inquiries were the most 
important factors in reaching the Approve conclu- 
sion. 

The second type of explanation. Decision Ex- 
planation 25. measures the contribution made by 
each known input variable to a decision. This type 
of explanation is best explained by contrasting it to 
Concluision Explanation 27. Both types of explana- 
tion measure the contribution made by each known 
input variable by examining the change in output 
when the value of the variable is assumed to be 
unknown. However. Conclusion Explanation exam- 
ines changes in output bounds, whereas Decision 
Explanation examines changes in output states. 
This difference is motivated by the fact that Con- 
clusion Explanation rneasures contributions to the 
conclusion condition, which is based on output 
bounds, while Decision Explanation measures. con- 
tributions to a decision, which is based oh output 
states. 

An additional difference concerns the sign of 
the contributions. As explained earlier, contributions 
in Conclusion. Explanation cannot be negative, be- 
cause knowing the value of an input variable can 
only strengthen-.the conclusion condition or leave it 
unchanged, relative to not knowing the value. How- 
ever, knowing the value of an input variable can 
vyeaken as well as strengthen a decision, as in- 
dicated by a decrease or increase in the Decisive- 
ness confidence measure. Thus, contributions in 
Decision Explanation can be negative or positive. 
As a matter of fact, a particular value for an input 
variable can make a positive contribution to a par- 
ticular decision in one input context, yet make a 



negative contribution to that same decision in an- 
other context. This is due to the interdependency 
among inputs that is found in feedforward networks 
with hidden layers. 

5 The operation of Decision Explanation is illus- 

trated in the flowchart of Figure 4 with regard to a 
single output variable. The module and figure are 
explained as follows: in block 131. the present 
invention examines the first or next input variable 

70 whose value is known. The value of that variable is 
temporarily assumed to be unknown, and the out- 
put states are temporarily adjusted in block 133 to 
reflect that assumption. The adjusted and true 
(before adjustment) output states are compared to 

75 determine the contribution of the input variable 
being examined. 

The method of computing the contribution is 
dependent on whether the output variable is' Bool- 
ean or symbolic in block 135. If the^ output variable 

20 is Boolean, the contribution is further dependent on 
whether the decision for that variable is TRUE or 
FALSE in block 137. If the decision is TRUE, the 
contribution in block 139 is equal to the true state 
minus the adjusted state of the PE underlying the 

25 Boolean output variable. If the decision is FALSE, 
the contribution in block . 141 is the adjusted output 
"state minus the true output state. Formally, the 
contributions equal: 

30 S<o)-S'(o) for TRUE decisions 
S'(o)-S(o) for FALSE decisions 

where o is the underlying output PE. S signifies the 
true state, and S* signifies the adjusted state.. 

35 If the output variable is symbolic, the contribu- 

tion in block 143 is based on the gap between. the 
state of the selected PE and the highest state 
among the unselected PEs. The contribution is 
equal to the gap given the true states minus the 

40 gap given the adjusted states. Formally, the con-- 
tributioh equals: 

(S(o)-S(o')HS*(o)-S'(o")) 

45 where o is the selected output PE, o' is the un- 
selected PE with the highest true state, and o" is 
the unselected PE with the highest adjusted state. 

After computing the contribution for a particular 
input variable, the present invention checks in 

50 block 145 to see whether all known input variables 
have been examined. If not, the next known vari- 
able is examined. Otherwise, Decision Explanation 
proceeds to block 147, where the contributions for 
all known input variables are multiplied by a scaling 

55 constant, determined such that the largest absolute 
value of any contribution is 100.0. The constant is 
calculated by dividing lOO.O by the maximum ab- 
solute value of the unsealed contributions. 
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In block 149, the scaled contributions, along 
with the corresponding variable nannes and values, 
are produced as output. The output is ordered 
numerically by contribution. In addition, the output 
distinguishes three groups of known input vari- 
ables: 1. those which make insignificant contribu- 
tions to the decision, i.e., contributions whose ab- 
solute values tall below a user-chosen significance 
threshold: 2. those which significantly support the 
decision, i.e., those with positive contributions at or 
above the threshold: and 3. those which signifi- 
cantly weaken the decision, i.e.. those with nega- 
tive contributions whose absolute values are at or 
above the threshold. 

Figure 8 contains an example of the output of 
Decision Explanation, again using the loan appli- 
cant described in Figure 7 and the network de- 
scribed in Figure 6. The output was produced 
using a significance threshold of 20.0. The con- 
tributions listed in the output were computed using 
the formula described above for contributions given 
a TRUE decision. The output shows that there were 
four factors significantly supporting the Approve 
decision. The largest such factor was the value of 
New Accounts. The other three factors involve the 
Overdue variables. The only factor significantly 
weighing against approval was the value of Time in 
Credit File. Note in Figure 9 that Time in Credit File 
was also the least supportive input variable as 
indicated by Conclusion • Explanation. It made the 
smallest contribution and was the only known input 
variable not in the minimal premises. However, in 
general, there is only an indirect relationship be- 
tween a variable's ranking according to the two 
types of explanation, since each measures a dif- 
ferent property. 

Determining, contributions in Decision Explana- 
tion requires computing adjusted output states. 
, based on the assumption, that a known input vari- 
able is unknown. However, computing the adjusted 
states does not require network iterations. This 
contrasts with conventional sensitivity analysis, 
where the effect of changing an input is determined 
by iterating the network after making the change. 
The present invention includes a method for finding 
the modified output resulting from a change in 
input, while avoiding the computational expense of 
a network iteration. 

Computation of a network iteration involves all 
the weights in the network, but in accordance with 
the present invention, a simpler partial weights 
method is used which involves only a subset of the 
weights. This is important because the computer 
time required for the computation is roughly pro- 
portional to the number of weights used. Specifi- 
cally, the partial weights method, according to the 
present invention, uses only the weights associated 
with connections that lie in a path between one of 



the input PEs whose state is being changed and 
one of the output PEs for which state change is 
being nneasured. 

Exactly which connections lie in such a path 
5 depends on the number of hidden layers and on " 
the connectivity of the network. In order to provide 
a specific example, reference is made to the net- 
work in Figure 6. Suppose it is necessary to find 
the modified output state of this network that re- 
70 suits from changing the state of input RE i frqm v 
to v*. The partial weights method uses only the 
weights for the connections from i to each of the 
three hidden PEs and the connections from each 
hidden PE to the output PE. 
75 The first step in the "partial weights computation 

for the network in Figure 6 is to calculate the 
resulting net-input of each of the three hidden PEs. 
The resulting net-input r(h) for each hidden node h 
is equal to 1(h) + ((v'-vrW(i.h)). where WCi.h) is the 
20 weight of the connection from i to h and 1(h) is the 
original net-input for h. The next step is to compute 
the resulting state for each of the hidden PEs. S'- 
(h), the new state of h, is equal to T(r(h)), where T 
is the conventional sigmoid transfer function used 
25 by the network of Figure 6. The modified output 
state is then computed by applying T to the 
weighted sum of each S'(h). 

Using the partial weights method, according to 
the present invention, each of the decision con- 
30 tributions in Figure 8 were computed with only 6 
weights. Without the method, each contribution 
would have required computing a network iteration, 
which uses 42 weights. The partial weights method 
is also used to compute the changes in decisions 
35 and Decisiveness that result when the user adds, 
modifies, or deletes input values. 

In addition to explaining decisions and conclu- 
sions, the Explanation module 23 has the ability to 
do Rule Generation 31. This process produces a 
40 set of if-then rules, each of which consists of some 
number of possible input values and a decision. A 
rule is interpreted to mean that the decision would 
be supported as a conclusion if the input values 
were known to be true. These rules are intended to 
45 approximately represent the knowledge implicit in 
the weights of the network. By makirig the relation- 
ship between input values and decisions explicit, 
the knowledge in the network is made explicit. The 
conversion from weights to rules results in some 
50 loss of precision, but the rules are intended only as 
an aid to a user who wishes to analyze the knowl- 
edge contained in the network, and are not a 
requirement for any of the other capabilities of the 
present invention. 
55 A key to Rule Generation is the minimal prem- 

ises 29 described earlier. The input values in a rule 
constitute a set of minimal premises for reaching 
the conclusion in the rule. In other words. Rule 
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Generation constructs rules such that if any of the 
input values in the rule were removed, the conclu- 
sion of the rule would no longer hold. Rule Genera- 
tion constructs rules by producing sets of nninlnnal 
premises to support hypothetical conclusions, 
where the premises consist of hypothetical (but 
possible) input values. Specifically, rules are gen- 
erated for an output variable by finding sets of 
hypothetical minimal premises that can support 
one of the possible. decisions for that variable as a 
conclusion. Each unique set of minimal premises 
found results in a rule. All possible sets are found 
and thus all possible rules are generated, restricted 
only by the user's option to specify a maximum 
number of premises per rule. 

All possible sets of minimal premises are found 
.by doing a search of input space. Specifically, Rule 
Generation uses a conventional depth-first tree 
search. In the context of the present invention, the 
paths through nhe simulated tree correspond to 
possible combinations of input values. Each level in 
the tree corresponds to a different input variable. 
Each node in the tree has n + 1 branches, wh'ere 
n is the number of possible values for the input 
variable corresponding to the level at which the 
node exists: One branch corresponds to each pos- 
sible value and the extra branch corresponds to the 
value being unknown. For Boolean input variables, 
the possible values are TRUE and FALSE, so n is 
2. For symbolic variables, n is the number of 
possible symbols. For continuous variables, there 
are an 'infinite number of possible values, so a- 
rep'resentative. sannple of the values must be cho- 
sen. The solution in the present invention is to 
choose three values representative of different re- 
gions within the value range of a given continuous 
input variable. The specific method for choosing 
these values is explained later. 

Searching along a particular path in the tree is 
interpreted as choosing the value (possibly UN- 
KNOWN) corresponding to each branch in the 
path. Thus, when the search reaches m levels 
down in the tree, the values of m input variables 
have been either set or said to be unknown. The 
search follows each possible path until the set 
values result in a conclusion or the b'--*om of the 
tree is reached. If the bottom is reach.r .: without a 
conclusion, then that path produces no rule. How- 
,ever. if a conclusion is reached, the known values 
set in that path, along with the conclusion, con- 
stitute a candidate rule. The candidate rule will be 
chosen only if its premises are minimal, i.e.. if all of 
the known values set in the path are required to 
support the conclusion. Rule Generation checks to 
see if any of the known values can be assumed to 
be unknown without a resulting loss of the conclu- 
sion. If so. the premises are not minimal, and the 
candidate rule is thrown away. Otherwise, the can- 



didate rule is included in the output of Rule Gen- 
eration. 

Turning now to the precise method for choos- 
ing the three sample values for each continuous 

5 input variables, Rule Generation chooses a value 
from the upper, lower, and middle regions of the 
variable's value range. The chosen values are re- 
ferred to as High. Low, and Medium, and are 
computed from the input variable statistics 13. The 

70 Medium value is equal to the mean of the input 
variable. The values for High and Low depend on 
which statistics are used. If the minimum and maxi- 
mum statistics are used, the following values are 
chosen for continuous input variable i: 

Low = (3 Min(i) + Max(i)) 4 
High = (3,May(i) + Min(i))4 

where Min(i) and Max(i) are the minimum and 
20 maximum values of i. The chosen values corre- 
spond to points one-quarter and three-quarters of 
the way through the interval [Min(i), Max(i)]. 

If the mean and standard deviation statistics 
are used, the following values are chosen: 

25 . . . ' 

Low = Mean(i) - (z'Sdd)) 
High = Mean(i) + (2'Sd(i)) 

where Mean(i) and Sd(i) are the mean value and 
30 standard deviation of variable i, and z is a positive 
value chosen by the user that applies to all con- 
tinuous input variables. 

Figure 1 1 contains a typical rule produced by 
Rule Generation, using the nnethod described 

35 above. The loan-scoring problem is again, used as. 
an example. Since this problem involves continu- 
ous input variables, it provides a good example of 
the present invention's- method for using continu- 
ous variables as rule premises, i.e.. the method of 

40 using High. Low. and Medium values. The rule in 
Figure 1 1 specifies six input values which are suffi- 
cient for supporting an Approve conclusion. If an 
applicant has those six values, he can be assured 
of approval, regardless of his values for the remain- 

45 ing seven input variables. 

Referring again to Figure 1. the third internal 
component is the Intelligent Knowledge Acquisition 
module 33. The term 'intelligent knowledge acquisi- 
tion' refers to a systematic method for determining 

50 what knowledge is most profitable to pursue. In the 
context of the present invention, this involves es- 
timating which unknown input variables will have 
the greatest impact on a particular output variable if 
their values become known. By pursuing values for 

55 the unknown input variables with the greatest influ- 
ence, the user can decrease the number of addi- 
tional inputs needed to reach a conclusion. This is 
especially important in domains, such as medical 
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diagnosis, where there can be a significant cost, 
risk, or time delay associated with gathering in- 
fornaation. 

The operation of the Intelligent Knowledge Ac- 
quisition (IKA) module is illustrated in the flowchart 
of Figure 5. The module measures the potential 
influence of each unknown input variable 33 on a 
user-chosen output variable. In block 161 of Figure 
5, the IKA module examines the first or next input 
variable whose value is unknown. Test values are 
determined for the variable in block 163. If the 
variable is Boolean or symbolic, each possible val- 
ue of the variable is used as a test value. If the 
variable is continuous, two test values are chosen. 
They are computed using the same formulas by 
which High and Low values are chosen for continu- 
ous input variables in Rule Generation 31. 

After the test values for the input variable are 
determined, an influence total is initialized to zero 
in block 165. The first or next test value is chosen 
in block 167 and the input variable is temporarily 
assumed to have that value. The module computes 
the contribution made by that input value to the 
decision for the user-chosen output variable. The 
contribution is computed using the same formulas 
by which Decision Explanation 25 computes con- 
tributions. The absolute value of the computed con- 
tribution is added to the influence total in block 
169. If all test values have been used in block 171. 
an influence measure for the input variable is com- 
puted in block 173 by dividing the influence total 
by the number of test values. The influence of an 
unknown variable is thus computed to be the mean 
absolute value of the contributions made by the 
variable's test values. 

After computing the influence of a variable, the 
IKA module checks to see, in block 175. if all 
unknown input variables have been examined. If 
not. the next unknown variable is examined. Other- 
wise, the module proceeds to block 177, where the 
influence measures for all unknown input variables 
• are multiplied by a scaling constant, determined 
such that the largest measure is 100.0. The con- 
stant is calculated by dividing 100.0 by the maxi- 
mum of the unsealed measures. In block 179. the 
scaled influence measures, along with the corre- 
sponding variable names, are produced as output. 
The output is " ordered numerically by influence 
measure. 

Figure 10 contains an example of the output of 
the IKA module, again using the loan applicant 
described in Figure 7 and the network described in 
Figure 6. As in an earlier example, this example 
assumes that we know only those values listed in 
■ Figure 7 that came from a credit report. That 
leaves five unknown input variables. The output in 
Figure 10 shows the influence measures computed 
for those five variables, using the method illustrated 



in the flowchart of Figure 5. The measures indicate 
that knowing the value of Bankcard Accounts or 
Time at Job would be most valuable for reaching a 
conclusion, whereas knowing the value of Time at 

5 Address would be least valuable. 

Note that the influence measures computed by 
the IKA module are dependent on the current set of 
input values, as is the case with the contributions 
computed by the Explanation module 23. Thus, an 

70 unknown input variable may be important in some 
situations, as indicated by a large influence mea- 
sure, and unimportant in others. 
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Claims 



1. A method of operating an expert system based 
on a trained continuous-state feedforward neu- 
ral network, where the network includes a layer 
of input processing elements (PEs). a layer of 

20 output PEs. and any number of layers of hid- 

den PEs, where the input PEs are representa- 
tive of input variables and the output PEs are 
representative of output variables, the method 
. comprising the steps of: 

25 determining a decision foi' each output 

variable based on the states of the underlying 
PEs: 

determining a conclusion for each output 
variable when, as inputs to the network be- 
30 come known, the decision becomes irrevers- 

ible: 

computing confidence measures for each 
decision: 

explaining each decision by computing the 
35 contribution made toward the decision by each 

input variable with a known value; 

explaining each conclusion by computing 
the contribution made toward reaching the con- 
clusion by each input variable with a known 
40 value: 

determining the potential influence on a 
decision of each input variable with an un- 
known value: and 

converting the knowledge implicit in the 
45 neural network into an explicit set of if-then 

rules. 

2. A method for computing high and low bounds 
on the states of hidden and output PEs a 

50 feedforward neural network, wherein these 

bounds serve as limits on the possible range 
of each such state, given known input states 
and any combination of possible states for 
some or all of the input PEs whose states are 

55 currently unknown, the method comprising the 

steps of: 

determining the upper and lower bounds 
on the values of continuous input variables. 
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using the input variable distribution statistics; 
and 

computing the bounds on the states of the 
output PEs recursively fronn the bounds on the 
input variables and the bounds on the states of 5 
the hidden PEs. 

A method according to claim 2 for determining 
when a conclusion can be made regarding the 
value of an output variable in a feedforward io 
neural network with continuous-state PEs. de- 
spite some number of unknown input states, 
the method comprising the steps of: 

determining the high and low bounds on 
the output states; '5 

using the high and. low bounds on the 
output states to determine if the tentative de- 
cision for the output variable can ever change; 
and 

. producing a conclusion if the tentative de- 20 
cision' cannot change. 

A method for computing a confidence measure 
of decisiveness for a decision corresponding to 
the value of an output variable in a feedforward 25 
neural network, the method comprising the 
steps of: 

determining the network output states giv- 
en a set of known Input values; and 

comparing the output states with the se- 30 
lected threshold or other decision criteria and 
quantitatively measuring the closeness of the 
output states to the decision criteria. 

A method according to claim 2 for computing a 35 
confidence measure of certainty for a decision 
corresponding to the value of an output vari- 
able in a feedforward neural network, the meth- 
od comprising the steps of: 

determining the bounds on the states of 40 
the output PEs; 

comparing the output bounds with the se- 
lected threshold or other decision criteria and 
quantitatively measuring the closeness of the 
bounds to the decision criteria to determine 45 
the probability that the output decision will not 
be reversed. 

A method of explaining the reasons "for a neu- 
ral network output decision, the method com- 50 
prising the steps of: 

determining the network output states on a 
set of known input values; 

comparing the output states with the de- 
cision criteria and determining an output de- 56 
cision: 

sequentially converting each known input 
value to an unknown condition; 



for each input value which Is converted 
from known to unknown, measuring the change 
in the output state relative to the decision 
criteria: and 

ranking quantitatively the importance of 
each known input value in determining the 
output decision. 

7. A method of explanation according to claim 3 
for measuring the contribution made toward 
reaching the conclusion condition by each In- 
put variable with a known value in a feedfor- 
ward neural network, the method comprising 
the steps of: 

determining the upper and lower output 
state bounds; 

sequentially converting each known Input 
value to the unknown condition; 

for each input value which Is converted 
from known to unknown, measuring the change 
in the upper and lower output bounds relative 
to the decision criteria; and 

ranking quantitatively the , Importance of 
each known input value In reaching the conclu- 
sion condition. 

8. A method according to claim 3 for computing a 
minima! subset of current input values that Is 
sufficient to support the conclusion condition in 
a continuous-state feedforward neural network, 
the method comprising the steps of: 

evaluating the relative changes in the out- 
put bounds for each known input value: and 

selecting the minimum set of the most 
important inputs which are sufficient to reach 
the conclusion condition. 

9. A method of measuring the potential influence 
an unknown input variable may have on the 
decision corresponding to the output of a feed- 
forward neural network if the value of the vari- 
able becomes known, the method comprising 
the steps of: 

for each unknown input value, sequentially 
determining the change in the output state as 
each input is varied over' Its possible value 
range: and 

comparing quantitatively the changes in 
■ output states relative to the decision criteria for 
each unknown Input. 

10. A method according to claim 3 for using the 
conclusion generating process to translate the 
knowledge implicit in a continuous-state feed- 
forward neural network into an explicit set of if- 
then rules supporting possible conclusions, the 
method comprising the steps of: 

determining a minimum set of input values 



25 EP 0 468 229 A2 



which will result in reaching the conclusion 
condition, where the set of input values then 
constitute a rule for the particular conclusion; 
and 

searching input values for the set of all 5 
possible rules which support any conclusion 
for a particular output variable. 

A method of using -the values of continuous 
input variables in the rules according to claim io 
10 wherein specific values are chosen to be 
representative of the upper, lower, and middle 
statistical regions of the variable's value range, 
and those values are examined .for possible 
inclusion in' one or more rules. 75 

A method according to claim 6 for using a 
subset of the weights of a feedforward neural 
network to compute the modified output states 
that result from an actual change or an as- 20 
sumed change in input values, the method 
comprising the additional steps of using the 
weights associated with connections that lie in 
a path between one of the input PEs whose 
state is being changed and one of the output 25 
PEs for which a modified state is being com- 
puted to compute the modified states. 
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'?' indicates a missing value. 

Accounts' refers to sources of credit such as loans and credit cards. 



VARIABLE APPLICANT'S 
NAME VALUES 

Active Accounts 1 
(# of accounts currenUy open) 



New Accounts 

(# of accounts opened in the last 1 8 months) 



of Inquiries 
(# of inquiries made into applicant's 
credit history in the last 6 months) 



Public Record Items 

(# of derogatory items in the credit report, 
e.g., bankruptcies and tax liens) 



Bankcard Accounts 
(# of bank credit cards 
listed as credit references) 



Retail Accounts 

(# of reiaii store credit cards 

listed as credit references) 



30- Days Overdue Now 0 
(if of accounts for which payment 
is currentJy 30 or more days overdue) 

90-Days Overdue Ever 0 
(# of accounts for which paynneni has 
ever been 90 or more days overdue) 

% Ever 60- Days Overdue 0.0 
(percentage of accounts for which payment 
has ever been 60 or more days overdue) 

Applicant's Age ^0 
(age in years) 

Time ai Address 276 
(months at current home address) 



Time ai Job 

(months at present job) 

Time in Credit File 25 
(months in credit-reporting agency's file) 



22 



EP 0 468 229 A2 



VAKIABLE 

Supporting Factors: 

New Accounts 
30-Days Overdue Now 
% Ever 60-Days Overdue 
90-Days Overdue Ever 

Insignificant Factors: 
Bankcard Accoiinis 
Time at Address 
Public Record hems 
Applicant's Age 
Active Accounts 
# of Inquiries 

Weakening Factors: 

Time in Credit File 



CONTRIBUTION TO 

VALUE LOAN APPROVAL 

0 100.0 

0 49.8 

0.0 34.9 

0 24.0 

1 18.8 
276 13.6 

0 5.3 
40 5.1 

1 '1.2 

2 -6.7. 



25 



-49.7 



Figure 8 
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VARIABLE 

Sufficient: 

30- Days Overdue Now 

4t of Inquiries 

Bankcard Accounts 

New Accounts 

Active Accounts 

% Ever 60-Days Overdue 

Public Record Items 

90-Days Overdue Ever 

Time at Address 

Applicant's Age 

Additiunul: 
Time in Credit File 



CONTRIBUTION TO 
VALUE LOAN APPROVAL 



0 
2 
1 
0 
1 

0.0 
0 
0 

27(5 
40 



100.0 
96.3 
78.0 
65.6 
54.2 
44.7 
33.1 
25.8 
20.9 
17.7 
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Figure 9 
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VARIABLE 

Bankcard Accounts 
Time at Job 
Retail Accounts 
Applicant's Age 
Time at Address 



INFLUENCE 

ido.o 

96.5 
62.9 

50.6 
31.7 
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IF New Accounts = LOW 

' AND 30-Days Overdue Now = LOW 

AND % Ever 60-Days Overdue = LOW 

AND Active Accounts = MEDIUM 

AND Bankcard Accounts = HIGH 

AND Time at Job = HIGH 
CONCLUDE Approve = TRUE r 

Figure 11 
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